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Abstract. Following a remark of Lawvere, we explicitly exhibit a particularly ele- 
mentary bijection between the set T of finite binary trees and the set T 7 of seven- 
tuples of such trees. "Particularly elementary" means that the application of the 
bijection to a seven-tuple of trees involves case distinctions only down to a fixed 
depth (namely four) in the given seven-tuple. We clarify how this and similar bi- 
jections are related to the free commutative semiring on one generator X subject to 
X = 1 + X 2 . Finally, our main theorem is that the existence of particularly ele- 
mentary bijections can be deduced from the provable existence, in intuitionistic type 
C~! . theory, of any bijections at all. 



Introduction 

This paper was motivated by a remark of Lawvere [8] , which implies that there 
is a particularly elementary coding of seven-tuples of binary trees as single binary 
trees. In Section 1, we explicitly exhibit such a coding and discuss the sense in 
which it is particularly elementary. In Section 2, we discuss the algebra behind 
this situation, which explains why "seven" appears here. Section 3 connects, in a 
somewhat more general context, the algebraic manipulations of Section 2 with the 



elementary codings of Section 1. Finally, in Section 4, we prove a meta-theorem 
saying that such particularly elementary constructions can be extracted from ex- 
istence proofs carried out in the much more liberal context of constructive type 
theory. 

Throughout this paper, we use tree to mean specifically a finite binary tree in 
which the immediate successors (= children) of any node are labeled left and right. 
Even if a node has only one child, the child must have a label. We admit the empty 
tree, denoted by 0, but every non-empty tree has a unique root, from which every 
node can be reached by repeatedly passing to children. The depth of a node is 
defined as the number of nodes on the path joining it to the root (so the root has 
depth 1), and the largest depth of any node is the depth of the tree. (If we need to 
assign a depth to the empty tree, we assign 0.) As this terminology suggests, we 
visualize trees as growing downward, the root being at the top. We use the notation 
[^1,^2] for the tree consisting of a root, a left subtree (consisting of the left child, 
if any, and all its descendants) isomorphic to t±, and a right subtree isomorphic to 
£2- For example, [0, 0] consists of just the root. By the leftward path of a tree, we 
mean the set of those nodes such that neither the node nor any ancestor of it is a 
right child, i.e., the set consisting of the root (if any), its left child (if any), its left 
child (if any), etc. 
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1. A Very Explicit Bijection 

Our first theorem is due to Lawvere, who mentions it (though without proof and 
not quite in the same form) in [8]. 

Theorem 1. There is a very explicit bijection between the set of all seven-tuples 
of trees and the set of all trees. 

Before we prove the theorem, we must explain the meaning of "very explicit," for 
without this phrase the theorem is trivial. The set T of all trees (as defined in the 
introduction) is clearly countably infinite and is therefore in one-to-one correspon- 
dence with the set T 7 of seven-tuples of trees (and also with T k for any finite k > 1). 
Since a bijection from T to the set of natural numbers can be given explicitly, so 
can the bijection T 7 T. This is not the content of the theorem. "Very explicit" 
means, roughly, that the value of the bijection / at a seven-tuple t = (£i, . . . , £7) 
of trees can be determined by (1) inspecting these seven trees down to a depth n 
that depends only on /, not on the particular trees, (2) depending on these seven 
partial trees, constructing a part p of /(£~), and (3) taking the subtrees that were 
below depth n in the £j's (and were thus ignored at step (1)) and attaching them 
to certain leaves of p. In other words, any structure occurring below depth n in the 
tiS is simply copied into fit); the "real work" done by / involves the t^s only to 
depth n. 

For a more precise definition of "very explicit" (which the impatient reader can 
skip for now, as it will not be needed until Section 3), first define a pattern to 
be a tree in which some of the leaves are labeled with distinct symbols. (This 
labeling has nothing to do with the left-right labeling that is part of the structure 
of every tree.) An instance of a pattern is obtained by replacing each labeled leaf 
by some tree; the function assigning to each label the corresponding tree is called 
the substitution leading to the instance. (Note that the substitution can have the 
empty tree as a value; the corresponding labeled node is then removed from 
the pattern when the instance is formed.) Similarly, we define 7-patterns to be 
seven-tuples of patterns with all labels distinct, and we define their instances under 
substitutions, which are seven-tuples of trees, by replacing every labeled node by 
the image of its label under the substitution. A very explicit function f : T 7 — > T is 
given by (1) a finite indexed family (pi)iei of 7-patterns such that every seven-tuple 
t of trees is an instance of exactly one Pi (under exactly one substitution) and (2) 
a family (qi)iei of patterns, indexed by the same /, such that each qi contains the 
same labels as p*j. To apply / to a seven-tuple t of trees, find the unique pi and the 
unique substitution yielding t as an instance, and then apply the same substitution 
to qi. 

A non-trivial example will occur in the proof of Theorem 1. For a trivial but 
instructive example, note that there is a very explicit function T 2 — > T sending 
every pair (£1, £ 2 ) of trees to the tree [£1, £2] consisting of a root with left subtree £1 
and right subtree £2- (In the notation of the preceding definition, it is given by a 
singleton /, so we can omit the subscripts i; by p consisting of two one-point trees 
with the points labeled, say, 1 and 2; and by q a tree consisting of the root, a left 
child labeled 1, and a right child labeled 2.) This very explicit map is one-to-one 
but not quite surjective, as the empty tree is not in its range. There is also a very 
explicit injection in the opposite direction, T — > T 2 , sending each tree £ to the 
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of injections to obtain a bijection T 2 — > T, but the result is not very explicit. In 
fact, the bijection one obtains is just like the injection T 2 — > T described above 
except that, if the output is just a leftward path (i.e., a tree where no node has 
a right child) then the path is shortened by one node. Unlike the injections, this 
bijection may need to look down to arbitrary depth in its input trees to determine 
whether the output is a leftward path. We shall see in Section 3 that there is no 
very explicit bijection T 2 — > T. 

Proof of Theorem 1. We shall define the desired bijection / : T 7 — > T by cases 
depending on the structure of its input t = (ti,...,t 7 ) (down to depth 4). To 
improve readability, we shall write integers k in place of the trees t^. 

Case 1. At least one of the first four trees is non-empty. Output the tree 
[[[[[[7, 6], 5], 4], 3], 2], 1]. 

Case 2. Trees 1 through 4 are empty, but tree 5 is not, say 5 = [5a, 56]. Output 
the tree [[[[0,7], 6], 5a], 56] 

Case 3. Trees 1 through 5 are empty, but tree 6 is not. Output the tree 
[[[[[6,7],0],0],0],0] 

Case 4. Trees 1 through 6 are empty, but the leftward path in tree 7 has at least 
4 nodes. So 7 = [[[[7a, 76] , 7c] , Id] , 7e] . Output the tree [[[[[0, 7a] , 76] , 7c] , Id] , 7e] . 
Case 5. Otherwise, output tree 7. 

This / clearly fits the rough description above of very explicit functions. To see 
that it fits the precise description, we should list the appropriate patterns pi and 
qi. There are eleven fPs and eleven corresponding g's — four for Case 1 (according 
to which of trees 1 through 4 is the first nonempty one), one each for Cases 2, 3, 
and 4, and four for Case 5 (according to whether the leftward path of tree 7 has 0, 
1, 2, or 3 nodes). We refrain from explicitly exhibiting the results of this splitting 
of cases. 

We must still verify that we have defined a bijection, i.e., that every tree arises 
exactly once as the output of this construction. For this purpose, note that the 
leftward path of the output has length 

> 6 in Case 1, 

4 in Case 2, 

> 6 in Case 3 (remember that tree 6 is non-empty here), 

5 in Case 4, and 
< 3 in Case 5. 

This shows that no two cases, with the possible exception of Cases 1 and 3, could 
produce the same tree as output. The possible exception is not a real exception, 
because in Case 1 at least one of the first four nodes on the leftward path has a 
right successor (because those successor subtrees are 1 through 4, of which at least 
one is non-empty by case hypothesis) whereas in Case 3 this is not the case. Thus, 
given an arbitrary tree t, we can determine the unique case that might produce it 
as output. Once this is done, it is straightforward to check that each case actually 
does produce all the trees thereby assigned to it, and that it produces them exactly 
once each. □ 

Theorem 1 would remain true if we extended the concept of tree to allow infinite 
trees, and the proof would be unchanged. The reason is that a very explicit bijection 
can be applied to infinite trees just as to finite trees, since below a certain depth 
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notions of "tree" can be handled similarly. For example, we could allow trees to 
have finitely many infinite paths; we could allow infinite paths only if, beyond some 
node on a path, all further nodes are left children; we could fix a set C of "colors" 
and allow infinite trees in which every infinite path is assigned a color from C 
(and we could impose continuity requirements on the coloring); etc. In each case, 
Theorem 1 remains true as long both occurrences of "trees" are interpreted the 
same way. 

2. Algebra 

The proof of Theorem 1 raises at least two questions. Where did it come from? 
And why seven? The reader is invited to try some "easier" numbers in place of 
seven, say five or two; no analogous proof will be forthcoming, so there is indeed 
something special about seven. (Of course there is an analogue for thirteen. Given 
thirteen trees, apply Theorem 1 to code the first seven as a single tree, and then 
code this tree with the other six of the original inputs. The same trivial observation 
handles any number congruent to 1 modulo 6.) 

To see why seven is special, we first give an argument to establish Theorem 1 
in the style of eighteenth-century analysis, where meaningless computations (e.g., 
manipulating divergent series as though they converged absolutely and uniformly) 
somehow gave correct results. This argument begins with the observation that a 
tree either is or splits naturally into two subtrees (by removing the root). Thus 
the set T of trees satisfies T = 1 + T 2 . (Of course equality here actually means 
an obvious isomorphism. Note that the same equation holds also for the variant 
notions of tree mentioned at the end of Section 1.) Solving this quadratic equation 
for T, we find T = \ ± i^. (The reader who objects that this is nonsense has 
not truly entered into the eighteenth-century spirit.) These complex numbers are 
primitive sixth roots of unity, so we have T 6 = 1 and T 7 = T. And this is why 
seven-tuples of trees can be coded as single trees. 

Although this computation is nonsense, it has at least the psychological effect of 
suggesting that something like Theorem 1 has a better chance of being true for seven 
than for five or two. To improve the effect from psychological to mathematical, we 
attempt to remove the nonsense while keeping the essence of the computation. 
Incidentally, Lawvere's remark that led to this paper was phrased in terms of such 
a meaningless computation giving a correct result: "I was surprised to note that 
an isomorphism x = 1 + x 2 (leading to complex numbers as Euler characteristics if 
they don't collapse) always induces an isomorphism x 7 = x" [8] p. 11. This remark 
was an application of the method of Schanuel [10] for developing objective meaning 
for such computations. 

One might hope for a meta-theorem to the effect that, when such a meaning- 
less computation leads from a meaningful equation (like T = 1 + T 2 ) to another 
meaningful equation (like T 7 = T) then the latter honestly follows from the former. 
This would be somewhat analogous to Hilbert's program for making constructive 
sense of infinitary mathematics by showing that, when a detour through the infi- 
nite leads from one finitistically meaningful statement to another, then the latter 
honestly (finitistically) follows from the former. Unfortunately, our situation (like 
Hilbert's) is more subtle. After all, T 6 = 1 is a meaningful equation, saying that 
there is only one six-tuple of trees, which is false. So any rehabilitation of the com- 
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right while the penultimate T 6 = 1 is not. 

As a first step toward rehabilitation, we observe that we can avoid complex 
numbers by working entirely within the ring of polynomials with integer coefficients. 
(The possibility of such a step is guaranteed by general facts about polynomial rings 
and ideals, but here is the step explicitly.) We simplify T 7 by repeatedly reducing 
its degree, using the equation T = 1 + T 2 in the form T 2 = T — 1. The result is 

rp7 rpQ rp5 rp4 rpS _j_ rp2 rp 

The complex numbers are gone, but the computation is still meaningless (when we 
remember that T is a set) because of negative coefficients, and we could still get 
T 6 = 1 just by reducing all exponents by one. 

The next step is to eliminate the negative terms by adding new terms (T 5 , T 4 , 
and T 3 ) to cancel them. The resulting calculation, which takes place in the semiring 
N[T]/(T = 1 + T 2 ), reads 

rp7 _|_ rp5 _|_ rp4 _|_ rp3 rp6 _|_ rp4 _|_ rpS rp5 _|_ rpS 

rp5 _|_ rp4 _|_ rp2 rp5 _|_ rp4 _|_ rp3 _|_ rp 

Everything here is meaningful and correct when T is interpreted as the set of 
trees (rather than a formal indeterminate subject to T = 1 + T 2 ) and equality is 
interpreted as obvious isomorphism (or, better, as very explicit bijection). Unfortu- 
nately, instead of getting a very explicit bijection from T 7 to T, we have the extra 
terms T 5 + T 4 + T 3 on both sides. Can we get rid of them? 

There is a general way to convert a bijection f: A + X^B + X into a bijection 
f':A^B when X is finite, namely the Garsia-Milne involution principle [6] . The 
idea is to define f'(a) by first applying / to a; if the result is in B, accept it as 
the output for /'; if not, then it is in X, so we can apply / again; if the result 
is in B, accept it as the output for /'; if not, then it is in X, so we can apply / 
again; and so forth until we finally get an output in B. The finiteness of X is used 
to ensure that the process terminates. In our situation, X = T 5 + T 4 + T 3 is not 
finite. One can try to apply the Garsia-Milne construction anyway, hoping that the 
process terminates even though no finiteness forces it to. Unfortunately, it does not 
terminate. 

We can do better by considering the last computation displayed above. It is in 
two parts. In the first part, T 7 + T 5 + T 4 + T 3 is simplified to T 5 + T 3 ; that is, 
T 7 + T 4 is eliminated. In this elimination, the presence of T 5 was essential as a 
sort of catalyst, since the T 5 is removed at the first step and restored at the second. 
T 3 , on the other hand is irrelevant to this half of the calculation. Clearly, the same 
argument shows that, in any polynomial in N[T]/(T = 1 + T 2 ), we can delete or 
introduce T k+S + T k provided T k+1 is also present as a catalyst. The second half 
of the displayed computation introduces T 4 + T, using T 3 as a catalyst; the same 
argument would allow us to introduce or delete T k+3 + T k provided T k+2 is also 
present as a catalyst. Thus, we can delete or introduce two powers of T whose 
exponents differ by exactly three, provided one of the two intervening powers is 
also present as a catalyst. 

This result can be significantly improved by noticing that any positive power of 
T can serve as a catalyst; it need not be between the two that are being deleted or 

;„-i-„„j j TU „ — -i-u;„ „ „„ „.„ „.„„j- j-„ j„i„j-„ — J T^fc+3 i mk „„j „.„ 
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have another positive power T r present in the polynomial. If r is k + 1 or k + 2, then 
we already know T r can serve as the desired catalyst. If not, then use T = 1 + T 2 
to replace T r with T r ~ 1 + T r+1 . So now we have two powers of T, in one of which 
the exponent is closer to the desired k + 1 or k + 2. Apply the same procedure to 
that power, leaving the other one alone, and repeat the process until finally you get 
the desired catalyst. Use it to delete or introduce T k+3 + T k , and then reverse the 
previous steps to recover the original T r by reassembling the terms into which the 
first part of the procedure decomposed it. (Note that we needed r to be strictly 
positive in order to replace T r with T r ~ 1 + T r+1 .) 

Now we can, in N[T]/(T = 1 + T 2 ), convert T 7 into T 7 + T 4 + T (introducing 
T 4 +T with T 7 as catalyst) and then into T (deleting T 7 + T 4 with T as catalyst). 
We refrain from writing out explicitly the computation in N[T]/(T = 1 + T 2 ) 
obtained by the method of the preceding paragraph. It consists of twenty steps: 
four to convert T 7 into a usable catalyst T 3 (plus extra terms), two to introduce 
T 4 + T using this catalyst, four to collect the catalyst and extra terms back into 
a T 7 , four to convert T into a usable catalyst T 5 (plus extra terms), two to delete 
T 7 + T 4 using this catalyst, and four to collect the catalyst and extra terms back 
into a T. Notice that the proof of T 7 = T does not become a proof of T 6 = 1 if we 
reduce the exponents; unlike T, 1 cannot serve as a catalyst. 

We can now answer the first of the questions at the beginning of this section - 
where did the bijection in the proof of Theorem 1 come from? It came from this 
twenty-step computation. Wherever the computation used T = 1 + T 2 , apply the 
obvious bijection between the sets T and {0} UT 2 , and the whole computation will 
give a composite bijection which is the one used to prove Theorem 1 (except that 
I interchanged left and right once, to make the description of the proof easier). 

As for the second question — why seven? - we have a partial answer. The 
technique that led to the proof of Theorem 1 works only for numbers k that are 
congruent to 1 modulo 6, because it is only for these numbers that T k = T is true 
in the semiring N[T]/(T = 1 + T 2 ), or even in the rings Z[T]/(T = 1 + T 2 ) or 
C[T]/(T = 1 + T 2 ), i.e., because it is only for these values of k that T k = T is 
satisfied by the roots ^±i^- of T = 1 + T 2 . A more complete answer to the second 
question would involve showing that very explicit bijections between T k and T can 
exist only when the corresponding polynomials are equal in N[T]/(T = 1 + T 2 ). 
This will be done, in greater generality, in the next section. 

Because of the importance of the semiring N[T]/(T = 1 + T 2 ) in computations 
like the preceding ones, we present a normal form for its elements. 

Theorem 2. Every element of N[T]/(T = 1 + T 2 ) is uniquely expressible in the 
form a + bT 2 + cT 4 with non-negative integer coefficients a, b, and c such that either 
at least one coefficient is or else a > b = c = 1. 

Proof. Every element of the semiring N[T]/(T = 1 + T 2 ) is a polynomial in T with 
non-negative integer coefficients. We show how to simplify such a polynomial to 
the form claimed in the theorem. Since T 7 = T, all terms of degree 7 or more can 
be reduced to lower degree, so we may assume the polynomial has degree at most 
6. The degree can be reduced to at most 4 by means of the equations 

T 6 = T 5 + T 7 = T 5 + T and T 5 = 1 + T 3 + T 5 = 1 + T 4 , 

where 1 +T 3 was introduced in the second equation with catalyst T 5 . Furthermore, 
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So our polynomial has the form a + bT 2 + cT 4 . We check next that the coefficient 
restrictions in the theorem can be enforced. 

For this purpose, consider the polynomial Q = 1+T 2 + T 4 . In the presence of a 
catalyst (any positive power of T), Q vanishes, for it equals T + T 4 which a catalyst 
can delete. Now consider our general polynomial a + bT 2 + cT 4 . If at least one of 
the coefficients is 0, then it has the desired form, so suppose all three coefficients are 
at least 1. Then the polynomial is Q+(a-l) + (b- 1)T 2 + (c - 1)T 4 . If either 6-1 
or c — 1 is positive, we have a catalyst and can remove Q. Repeating the process of 
splitting off a Q and deleting it as long as a catalyst is available, we ultimately get 
either a polynomial of the form a + bT 2 + cT 4 with a zero coefficient (if, at some 
stage, we cannot split off a Q) or one of the form d + Q with d e N (if, at some 
stage, we have split off a Q but have no catalyst to remove it). This completes the 
proof that every polynomial can be put in the required form; it remains to show 
uniqueness. 

If two expressions of the form described in the theorem, say a + bT 2 + cT 4 and 
a' + b'T 2 + c'T 4 , are equal in N[T]/(T = 1 + T 2 ), then they are also equal as complex 

numbers when T is interpreted as the primitive sixth root of unity t = \ + 
(since C is a commutative semiring, and t = 1 + 1 2 , and N[T]/(T = 1 + T 2 ) with T 
is the initial commutative semiring with a solution of T = 1 +T 2 ). t 2 is a primitive 
cube root of unity; its minimal polynomial is 1 + z + z 2 . So this minimal polynomial 
would have to divide (a — a') + (b — b')z+ (c — d)z 2 , which means that a — a', 6 — 6', 
and c-d are all equal. If they are all zero, then a + bT 2 + cT 4 and a' + b'T 2 + c'T 4 
are identical as expressions, which is what we needed to prove. So suppose that 
the differences a — a', 6 — 6', and c — d are all positive. (If they are all negative, 
interchange primed and unprimed in what follows.) A fortiori, a, 6, and c are all 
positive, so a + bT 2 + cT 4 , being of the form in the theorem, has b = c = 1 and 
a > 1, i.e., it is (a — 1) + Q. As a — a', b — b' and c — d are equal and positive, we 
must have b' = d = and a' = a — 1. 

Our task is thus reduced to showing that a! and a'+Q are not equal in N[T]/(T = 
1+T 2 ). For this purpose, notice that the set consisting of the natural numbers and 
the cardinal Ko is a commutative semiring (with the usual addition and multipli- 
cation of cardinal numbers) in which No = 1 + No 2 - So an equation a' = a' + Q in 
N[T]/(T = 1 + T 2 ) would imply the same equation in this cardinal semiring with 
T interpreted as K . But in this interpretation such an equation is false, since a' 
is interpreted as the integer a' while a' + Q is interpreted as Kq. So the equation 
cannot hold in N[T]/(T = 1 + T 2 ). □ 

3. Algebraic Equivalence and Very Explicit Bijections 

Let us call two polynomials, P(X) and Q(X), with non-negative integer coeffi- 
cients algebraically equivalent if the equation P(X) = Q(X) holds in the semiring 
N[A]/(A = 1 + A 2 ). We call the indeterminate X rather than T to avoid confusion, 
since we shall need to discuss it and the set T of trees in the same context. 

A polynomial P(X) e N[X] has a natural interpretation as an operation S i— > 
P(S) on sets (well-defined up to canonical bijections); sums and products of poly- 
nomials correspond to disjoint unions and Cartesian products of sets. (We can 
interpret the numerical coefficients in P(X) as canonically chosen sets of the cor- 
responding cardinalities, or we can eliminate these coefficients in favor of sums of 

„j-„j 4- — ™„ '\ rpu„„i.„ -t„ -tu„ „„ ;„„i u;; — -t; — f ™ Tt„ 1 i t^2 „i„„i — •„ „n„ 
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equivalent polynomials, when applied to the set T of trees, give canonically isomor- 
phic sets. In fact, we shall show, as half of the next theorem, that the canonical 
isomorphism is very explicit in the sense of Section 1. 

Let us call two polynomials P(X) and Q(X) combinatorially equivalent if there 
is a very explicit bijection from P(T) to Q(T). Of course, we must extend the 
definition of "very explicit," given in Section 1 for the special case of T 7 and T, to 
the general case of P(T) and Q{T), but this is straightforward. If P(X) = ^ fc CkX k , 
then we regard P(T) as the set of tagged fc-tuples of trees, t = (ti, . . . , £&> t) where 
k ranges over the same finite set as in the sum defining P(X) and where the tag r 
is an integer in the range 1 < r < c/~. For brevity, we call such a tagged /c-tuple a 
P-tuple. A P-pattern is a similarly tagged /c-tuple of patterns (in the sense defined 
in Section 1) in which all the labels are distinct, and the notion of an instance of a 
pattern with respect to a substitution is defined just as in Section 1. A very explicit 
function f from P(T) to Q(T) is given by an indexed family (pi)i^i of P-patterns 
and a similarly indexed family (qi)i£i of Q-patterns such that, for each i e J, the 
same labels occur in p^ as in q^ and such that every element t of P(T) is an instance 
of a unique pi (under a unique substitution). Then f(i) is defined as the result of 
applying the same substitution to <fj (for the same i). 

Remark. If the very explicit function / is a bijection from P(T) to Q(T), then 
every element of Q(T) occurs exactly once as an instance of a qi (in the notation of 
the preceding definition). It follows that the inverse bijection is also very explicit, 
as we can interchange the roles of the p\ and the Thus, it makes no difference 
whether "very explicit bijection" is interpreted in the obvious way as "very explicit 
function that happens to be a bijection" or as "very explicit function with very 
explicit two-sided inverse." 

Our interest will be focused on very explicit bijections, and for the study of 
these our definition of "very explicit" seems adequate. If we were to study very 
explicit functions in general, it would be advisable to liberalize the definition of 
"very explicit" by allowing labels to be repeated in a qi and allowing a label to 
occur in pi without occurring in In either of these cases, the very explicit 
function cannot be a bijection. 

Theorem 3. Two polynomials P(X), Q(X) G N[X] are combinatorially equivalent 
if and only if they are algebraically equivalent. 

Proof. Suppose first that P(X) and Q(X) are algebraically equivalent. Since N[X] 
is the free commutative semiring on one generator, its quotient NLY]/(X = 1 + X 2 ) 
is the initial algebra in the variety V of commutative semirings with a distinguished 
element X subject to X = 1 + X 2 . Algebraic equivalence therefore means that 
the equation P(X) = Q(X) is satisfied by the whole variety V. (Note that here 
X is a constant for the distinguished element of a V-algebra.) By Birkhoff's the- 
orem [3, Theorem 14.19], there is an equational deduction of P(X) — Q(X) from 
X = 1 + X 2 and the axioms for commutative semirings, using as rules of inference 
only substitution of equals for equals and substitution of terms for variables. All 
the substitutions for variables can be done at the beginning, as substitutions into 
axioms. So we get a deduction of P(X) = Q(X) from X = 1+ X 2 and variable-free 
instances of the commutative semiring axioms, using only substitution of equals for 
equals. 

r* — „;j — j-u„ -f„n„„.;„„ 4-,. „f „ + ; — „. j-U„ -t„.„ „;j„„ „ „ 
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as operations to T, the two resulting sets have a bijection between them such that 
both the bijection and its inverse are very explicit. It is trivial to check that the 
equation X = 1 + X 2 and all variable-free instances of the axioms of commutative 
semirings enjoy this property and that the property is preserved by substitution of 
equals for equals. It therefore follows that P(X) = Q(X) has this property, which 
implies combinatorial equivalence. 

For the converse, suppose that P(X) and Q(X) are combinatorially equivalent. 
Specifically, let / be a very explicit bijection from P(T) to Q(T), and let (pi)iei 
and (qi)iei be as in the definition of "very explicit" preceding the theorem. 

It will be convenient, both for this proof and for Section 4, to introduce certain 
standard collections of patterns. (We temporarily deal with patterns in the sense 
of Section 1, single trees; we will return to tuples and P-patterns later.) For each 
positive integer n, let S n be the set of patterns p of depth < n + 1 such that all 
nodes at level n have exactly two children, all nodes at level n + 1 are labeled 
(with distinct labels, as required by the definition of pattern), and no nodes at 
levels < n are labeled. If p e S n then its instances are exactly those trees that 
are identical with p down to depth n, with no constraints on what (if anything) 
happens at greater depth. In keeping with this, we can define So to consist of just 
one pattern, which consists of a single, labeled node. Clearly, for every n, every 
tree is an instance of a unique member of S n . 

If r is any pattern of depth < n, then we can associate to it a set f C S n having 
collectively the same instances as r. One can produce f from r by the following step- 
by-step procedure, which we call developing r (to depth n). Choose any labeled 
leaf in r and replace r with two new patterns, r' in which this leaf has been deleted, 
and r" in which this leaf has been given two labeled children (and has lost its own 
label, as it is no longer a leaf). Apply the same construction to r' and r" , using 
labeled leaves at depths < n. Iterate the process until all the patterns have their 
labels at level n + 1. At this stage, they are easily seen to be in S n . Furthermore, 
the instances of r are precisely the instances of r' and those of r" (according to 
whether the substitution gives the label of the chosen leaf the value or not) . And 
no tree is an instance of both r' and r" . It follows inductively that, at each stage of 
the development, the patterns have disjoint sets of instances and the union of these 
sets contains precisely the instances of r. Thus, the final set of patterns obtained 
by this development is f. 

We need to extend the preceding concepts from patterns to P-patterns (and Q- 
patterns). By S n (P) we mean the set of all P-patterns whose component patterns 
lie in S n . (Recall that a P-pattern is a tagged /c-tuple of patterns, so this makes 
sense.) Every P-tuple of trees is an instance of a unique pattern in S n (P) (for each 
fixed n). P-patterns can be developed componentwise; that is, the development 
of (?r, . . . , Tfc, r) to depth n (> the depths of all the r/s) consists of the patterns 
(zi, . . . , Zk, t) G S n (P) with each Zj in fj C S n . Again, the members of the de- 
velopment of r have disjoint sets of instances and the union of these sets contains 
precisely the instances of r. 

Let us return to the families (j>i)iei and (qi)iei determining our very explicit 
bijection /. Fix n greater than the depths of all the patterns in all these P- and 
Q-patterns. Since every P-tuple is an instance of exactly one p i: it follows from the 
preceding discussion that the sets pi C S n (P) obtained by developing f>i to depth 
n are disjoint and their union is S n (P). 

o,. -i-u„ „:„u j. „c „ „„4-i- „ „.„ ™„„„ i-i, „ ™ ™;„i vl ^- rjT vl / / v i i v2\ 
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where / is the number of labels in r. By the weight of a (tagged) tuple of patterns 
(e.g., a P- or Q-pattern), we mean the product of the weights of its component 
patterns, i.e., X with exponent the total number of labels in the tuple. By the 
weight of a set of patterns or of (tagged) tuples of patterns, we mean the sum of 
the weights of its members. 

One step in the development of a pattern r replaces it by a set of two patterns r' 
and r" where, if r has weight X 1 , then r' has weight X l ~ l (as one labeled leaf was 
deleted) and r" has weight X l+1 (as one labeled leaf lost its label but was given two 
labeled children). So the weight of the set obtained is X l ~ x + X l+1 = X 1 , since the 
weights are in a semiring where 1 + X 2 = X. So one step of development leaves the 
weight unchanged. It follows by induction that development of a pattern to depth 
n leaves the weight unchanged; r and f have the same weight. It further follows 
easily that the same applies to (tagged) tuples of patterns. 

These observations imply that the set S n has weight X, because So trivially has 
weight X and S n +i is obtainable by developing S n . It follows that S n (P) has weight 
P{X), because the contribution to the weight from the fc-tuples with a particular 
tag r is 

k 

weight ( r) = j^J weight (rj) = 

fe(S„) fc re(S n ) k j=i 
k k 
Yl weight(r) = Yl weight^) = X k . 

3=1 reS n 3=1 



Since, as we saw above, S n {P) is obtainable by developing the set of p^'s, this set 
also has weight P{X). Similarly, the set of qi's has weight Q(X). But each fi 
contains exactly the same labels as the corresponding cf^, so these two sets have the 
same weight. Therefore, P{X) = Q(X) in N[X]/(X = 1 + X 2 ). □ 

It follows immediately from Theorem 3 that, if P(X) and Q(X) are combinato- 
rially equivalent, then P(t) = Q(t) where t = ^ + i^-. Thus, the complex number 
P(t) serves as an invariant of the combinatorial equivalence class of P(X). This is 
what Lawvere referred to as "complex Euler characteristics" in the passage, quoted 
in Section 2, that motivated this paper. In fact, Theorem 3 and the proof of The- 
orem 2 show that the semiring of combinatorial equivalence classes of polynomials 
is embedded in the product of the complex field and the semiring of cardinals < Kq 
(by sending the indeterminate X to (t, Kq)). Lawvere has pointed out that for this 
purpose one could replace the cardinal semiring with a three-element semiring, for 
all the finite, non-zero cardinals can be identified without damaging the embed- 
ding. The resulting three elements form the system of "dimensions" associated to 
our problem by Schanuel's general construction [10], so that the present solution of 
the word problem shows in particular that again in our case "Euler characteristic 
and dimension" are jointly injective. 

4. Constructive Set Theory 

We show in this section that the algebraic equivalence of P(X) and Q(X) can 
be deduced from the mere provability of "there is a bijection from P(T) to Q(T)" 
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we need are two: The underlying logic is constructive and the only assumption 
about T is the existence of a bijection T — > 1+T 2 . (Actually, the second restriction 
can be relaxed a bit by allowing a stronger assumption about T.) On the other 
hand, we allow the use of higher-order logic, so many set-theoretic methods are 
available. 

More precisely, let £ be a higher-order theory in the sense of [4] or a local set 
theory in the sense of [1] , generated by a natural number object and an additional 
ground type T subject to the axiom "there is a bijection 1+T 2 — > T." Alternatively, 
we could let £ be intuitionistic Zermelo-Fraenkel set theory augmented with a 
constant T and the same axiom. 

Theorem 4. Let P(X) and Q(X) be polynomials with non-negative integer coef- 
ficients. Suppose it is provable in C that there is a bijection P(T) — > Q(T). Then 
P(X) and Q(X) are algebraically equivalent. 

Before proving the theorem, we make several remarks. First, the converse of 
the theorem is easy to prove. By Theorem 3, algebraic equivalence implies the 
existence of a very explicit bijection P(T) — > Q(T) when T is the set of trees. 
But the very-explicitness makes it possible to apply the bijection to arbitrary sets 
for which a bijection 1 + T 2 — > T is given, and this application can be carried 
out in constructive set or type theory. Alternatively, we can proceed as in the 
proof of Theorem 3, considering for all equations P(X) = Q(X) the property "it is 
provable in C that there is a bijection P(T) — > Q(T)" noticing that this property 
is enjoyed by the equation 1 + X 2 = X and by all variable-free instances of the 
axioms for commutative semirings, noticing further that the property is preserved 
by substitution of equals for equals, and concluding that the property holds of all 
equations true in N[X]/(X = 1 + X 2 ). 

The remaining remarks are intended to justify the restrictions we place on the 
logic and the assumptions on T in the theory C 

If we allowed full classical set theory, with the axiom of choice, then the assump- 
tion T = 1 + T 2 implies that T is infinite and therefore T, 1+T, T + T, and T 2 all 
have the same cardinality. It follows that every non-constant polynomial is equiv- 
alent, in the sense of provable bijection, to T. In other words, with this stronger 
set theory, the corresponding notion of algebraic equivalence would be equality not 
in N[X]/(X = 1+X 2 ) but in N[X]/(X = 1 + X = X + X = X 2 ), a semiring 
isomorphic to the NU {^o} example used at the end of the proof of Theorem 2. 

If we work in classical set theory without the axiom of choice, so that addition 
and multiplication of infinite cardinals are no longer trivial, we still get the same 
conclusion with a bit more work. From T = 1 + T 2 and its immediate consequence 
T 2 = T + T 3 , we infer that each of T and T 2 can be embedded in the other. By 
the Cantor-Schroder-Bernstein Theorem, whose proof does not require the axiom 
of choice, we have T = T 2 . Then from 1 < 2 < T (where < means embeddability, 
the usual inequality relation on cardinals) we get T < T + T < T 2 < T, so another 
application of the Cantor-Schroder-Bernstein Theorem gives T = T + T. 

If we use intuitionistic rather than classical logic, then the Cantor-Schroder- 
Bernstein Theorem is no longer available and, as Theorem 4 shows, the argument 
in the preceding paragraph breaks down. Even in intuitionistic logic, however, if 
we assume that T is the set of trees (rather than some arbitrary set with a bijection 
1 + T 2 — > T) then the argument in the preceding paragraph works, since the bi- 

: — 4-: — „ J J u,. j-U„ r~t„„j. — o„ l — — o th ™ „„„ ;„ ^u;„ „„„„ u„ 
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constructively defined. For example, we described, just before the proof of Theo- 
rem 1, a bijection T 2 — > T, and that description is intuitionistically legitimate. The 
main point here is that the case distinction, whether a tree is just a leftward path, 
is decidable because the tree is finite. There is a similarly constructive bijection 
T + T — > T when T is the set of finite trees. It sends any tree t from the first 
copy of T to [0, £], and it sends any t from the second copy of T to [£, 0] unless t 
is of the form or [p, q] or [[p, g], 0] or [[[p, g], 0], 0] or ... with p and q both 7^ 0, 
in which case it sends t to t. Again, it is the finiteness of the trees that makes the 
case distinction decidable and the definition constructively correct. 

Proof of Theorem 4- We begin by giving a more useful description of sets T with 
bijections / : 1+T 2 — > T. The bijection is determined by specifying a distinguished 
element of T and a binary operation on T; the distinguished element is the value 
of / at the unique element of 1, and the operation is the restriction of / to T 2 . To 
match the notation used earlier for trees, we write the distinguished element as 
and the operation as [— , — ]. Thus, T (with the structure /) is an algebra with one 
constant and one binary operation. That / is a bijection means that this algebra 
must satisfy the following system T of axioms, which, for later convenience, we 
write as geometric sequents (as defined in [1, page 250] or [7, Section 6.5]), indeed, 
whenever possible, as universal Horn formulas. 

(1) = [x,y] =>> false 

(2) [x,y] = [x',y'] => x = x' 

(3) [x,y] = [x',y'] => y = y' 

(4) true =>- x = V x = [y, z] 

The algebra of finite trees is initial in the variety of algebras of signature {0, [— , — ]}, 
and, since it satisfies the axioms of T, it is also initial in the category of models 
of T. Any model M of T can be regarded as an algebra of generalized trees, in 
that it has an element corresponding to the empty tree, and all its other elements 
are uniquely of the form [y, z] and can therefore be pictured as consisting of a root 
with two subtrees, y and z, attached to it. Among such algebras are, for example, 
the collections of trees in any of the generalized senses mentioned at the end of 
Section 1. 

We shall prove the theorem by constructing a specific topos model of C. The 
hypothesis of the theorem says that in this model there must be a bijection P(T) — > 
Q(T), and an analysis of what this means will lead to the desired conclusion. Per- 
haps the most natural topos model of C is the classifying topos ([7, Section 6.5]) 
of T, with T intrepreted as the generic model in this topos. For technical reasons, 
however, it is easier to work with the classifying topos of a slightly stronger theory, 
T 7 , obtained from T by adding the following axiom for every term t that contains 
the variable x but is not just x. 

(5) t = x false 

This additional axiom schema says that no (generalized) tree in a model of T can 
be a proper subtree of itself. It is satisfied by the initial algebra (of finite trees), 
but not by, for example, the algebra of all finite and infinite binary trees, where the 
full binary tree (every node of which has two children) satisfies 

Notice that, apart from making the proof easier, the addition of axiom schema 
(5) to T slightly improves the theorem. The theorem remains true (with the same 

c\ :-(•„. 1„ — r u,. j-U„ „t- , — -t-i ri „.i t :„ „„„..™„J j-„ „„-i-;„f,. I r\ 
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This strengthening of £ weakens the hypothesis of Theorem 4 and thus strengthens 
the theorem. 

As indicated above, we shall work with the classifying topos £ of T . In it, there 
is a generic (or universal) model G of T'\ this implies that, for any model M of 
T in any Grothendieck topos JF, there is a geometric morphism \i : T ^> £ whose 
inverse image functor /i* sends G to M. (It implies more than this, but this, along 
with the explicit construction described below, will suffice for our purposes.) Being 
a Grothendieck topos, £ gives an interpretation of higher order logic (as in [4]) 
and local set theories ([1]), with natural number object, and intuitionistic Zermelo- 
Fraenkel set theory ([5]), so by interpreting T as G we obtain a model of £ in the 
internal logic of £. By the hypothesis of Theorem 4, it must be internally true in 
£ that there is a bijection from P(G) to Q(G). 

The next part of the proof consists of studying £ and G in sufficient detail to 
draw useful conclusions from this internal information. We begin by describing 
£ explicitly as the topos of sheaves over a specific site. As explained in [9], it 
is convenient to first build the classifying topos for the universal Horn axioms and 
then obtain £ as a sheaf subtopos. For T 7 , axioms (1), (2), (3), and (5) are universal 
Horn sentences. The classifying topos for this subtheory is, according to [2], the 
topos of presheaves on the dual of the category A of finitely presented models of 
(1), (2), (3), and (5). So we need to analyze such models (A \ E), where A is a 
finite set of generators and E is a finite, consistent set of equations between terms 
built from the generators, 0, and [— , — ]. 

We show first that every such model is free, i.e., is isomorphic to (B) = (B | 0) 
for some finite set B. To see this, we systematically simplify the given set E of 
equations as follows. (Technically, the simplification is an inductive process; at each 
step, we either decrease the cardinality of A or we leave this cardinality unchanged 
but decrease the total length of all the equations in E.) If occurs as one side of an 
equation in E, then the other side must be or a member of A; it cannot be of the 
form [ti, £2] because then the equation would be inconsistent by (1). If it is 0, then 
the equation = can be deleted from E as it is always true. If it is a member 
a of A, then we can delete a from A, delete the equation = a (or a = 0) from 
E, and replace all occurrences of a in the rest of E by 0. The result is a simpler 
presentation of the same algebra. So we may assume from now on that does not 
occur as a side of an equation in E. Suppose next that an element a of A occurs 
as a side of an equation in E. If the other side is also a, then the equation a = a 
can simply be omitted. Otherwise, the other side must be a term t not involving 
a, for if it involved a then the equation would contradict (5). So we can delete a 
from A, delete a = t (or t = a) from E, and replace all other occurrences of a in 
E by t. Again, we have a simpler presentation of the same algebra. So we may 
assume that each equation in E has the form [t\, t 2 ] = [£3, £4]; but such an equation 
can, thanks to (2) and (3), be replaced with the two equations t\ = £3 and t 2 = £4, 
of lesser total length. So again we get a simpler presentation of the same algebra. 
Repeating these steps, we find that the process must terminate, for the size of A 
cannot decrease infinitely often, and, after it stops decreasing, the total length of 
E cannot decrease infinitely often. But the only way the process can stop is if E 
has become empty. This proves that (A | E) is isomorphic to (B | 0) for some B 
(a subset of A). 

By virtue of this simplification, we may regard A as consisting of only the free 
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A occurring are of the form {1,2, ... ,k} for natural numbers k, since every A is 
isomorphic to one of these. We write (k) for ({1,2, ... ,k}). 

The elements of (k) are the variable-free terms of the language having the con- 
stant symbol 0, the binary operation [— , — ], and constant symbols for the generators 
1,2, ... ,k. They can be identified with trees in which leaves may (but need not) be 
labeled with integers in the range from 1 to k. So they are like patterns (defined in 
Section 1) except for the restriction on the possible labels and the fact that several 
leaves are allowed to have the same label. We call them k-labeled trees. Note in 
particular that the members of (0) are simply the trees. 

A morphism in A from (k) to (/) is, since (k) is free, simply a map from 
{1,2, . . ., k} into (/), i.e., a /c-tuple of /-labeled trees. To compose this with some 
(/) — > (m), i.e., with an /-tuple of m-labeled trees, take the fc-tuple of /-labeled 
trees and replace, in each of its component trees, each leaf labeled j with the jth 
m-labeled tree in the given /-tuple. The identity morphism of (k) is the /c-tuple 
whose ith member is a single node labeled i. 

The topos of set-valued functors on A is the classifying topos for the universal 
Horn theory axiomtized by (1), (2), (3), and (5). The universal model U is the 
underlying set functor. For details about this, see for example [2]. 

To obtain the classifying topos for the full theory T', we must pass to the subto- 
pos of sheaves for the Grothendieck topology "forcing" the remaining axiom, (4). 
See [9, 11] for more information about forcing topologies. In the case at hand, the 
topology in question is that described in Part (1) of the following lemma, whose 
other parts give useful alternative ways of viewing this topology. In connection 
with Parts (2) and (3) of the lemma, recall that in the proof of Theorem 3 we 
introduced, for any polynomial P(X), a set S n (P) of tagged tuples of patterns; we 
shall need this for the polynomials X k . In this special case of monomials, all tags 
are 1, so they can be omitted, and all the tuples are /c-tuples. So the elements of 
S n (X k ) can be taken to be simply /c-tuples of patterns and thus, by suitable choice 
of labels, morphisms (k) — > (/) for certain Z's. 

Lemma. The following four Grothendieck topologies on the dual of A coincide. 

(1) The smallest topology for which (1) is covered by the set of two morphisms 
(1) -> (0) : 1 i-> and (1) -> (2) : 1 [1, 2] 

(2) The smallest topology in which, for each n, each (k) is covered by the set 

Sn(X k ). 

(3) The topology where the covering sieves of any (k) are those sieves that in- 
clude S n (X h ) for some n. 

(4) The topology where the covering sieves of any (k) are those sieves that in- 
clude a finite family of maps (k) — > (/j) such that every map (k) — > (0) 
factors through a map from the finite family. 

(It is part of the assertion of the lemma that the collections of sieves described in 
(3) and (4) are topologies.) 

Proof. We first show that the family of sieves in described in (4) is a Grothendieck 
topology. It clearly contains the maximal sieve on any object (use the family 
consisting of just the identity map). If it contains a sieve R on (k), witnessed by 
a finite family of (k) — > (/j) as in (4), and if / : (k) — > (m) is any morphism, 
then those pushouts of the (k) — > (/j) along / that exist in A witness that (4) 
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.R, i.e., the pullback of R along / in the sense of the dual category. (We use here 
that, for any pair of maps in A with the same domain, if they can be completed 
to a commutative square then they have a pushout. This is a general property of 
categories of models of universal Horn theories.) Finally, the alleged topology is 
closed under composition, because if a family of maps (k) — > (h) and, for each i, 
another family of maps (h) — > (rriij) satisfy the requirements in (4), then so does 
the family of all (k) — > (rriij). Thus, (4) is a topology. 

This topology contains the sieve generated by the two maps in (1). Indeed, these 
two maps themselves serve as the finite family required in (4), for any morphism 

(1) — > (0) must send 1 to either the empty tree or a non-empty tree [^1,^2], and 
in the former case it factors through the specified map (1) — > (0) while in the latter 
case it factors through the specified map (1) — > (2) (via the map sending 1 and 2 
to ti and t 2 ). Therefore, the topology (4) includes the topology (1). 

We show next that (1) includes (2); of course it suffices to show that the generat- 
ing covers S n (X k ) in (2) are also covers with respect to (1). For this purpose, note 
that S n (X h ) can be obtained from the identity map of (k) (a fc-tuple of distinctly 
labeled, one-element trees) by repeated use of the development process described in 
the proof of Theorem 3. That is, a tree r with a particular labeled node is replaced 
by two trees, r' where that node has been removed and r" where that node has been 
given two distinctly labeled children and has lost its own label. Notice that, in each 
of the /c-tuples arising in this development process, no labels are repeated. We show 
that applying one development step to a cover in the topology (1), with no repeated 
labels in its /c-tuples, yields again a cover. This will clearly imply that S n (X k ), 
obtained by repeated development of a trivial cover, is itself a cover, as desired. So 
consider one development step applied to such a cover. Suppose it replaces a node 
labeled i in some (unique) component of a /c-tuple, (k) — > (/), in the given covering. 
Then the pushouts in A of the two maps in (1) along (1) — > (/) : 1 1— > i cover (/) 
(since a topology on the dual of A is closed under pullbacks in this dual). So in the 
given cover of (k) we may replace the map (k) — > (I) by its composites with these 
two pushouts, and we still have a cover of {k). But this replacement is precisely the 
development step at label i. This completes the proof that topology (1) includes 

(2) . 

It is trivial that all the sieves described in (3) are in topology (2). That (3) is a 
topology will follow once we show that it includes (4), for then all four items listed 
in the lemma are equal. 

So, to complete the proof, we consider an arbitrary sieve R containing a finite 
family of maps fi : (k) — > (Zj) as in (4), and we show that this sieve is in (3). Fix an 
integer n greater than the depths of all the labeled trees occurring in the /c-tuples fi. 
We would like to develop all these /c-tuples to depth n and show that the resulting 
fc-tuples are all in R and include all of S n (X h ). Some caution is needed, however, 
since the same label may occur several times in an fi, and development has not 
even been defined for such an fi. We begin by extending the notion of development 
to the case of repeated labels. If label z occurs several times in a /c-tuple r (either 
in the same component tree or in different components) and if all its occurrences 

are at depths < n, then we develop r at z by replacing it by two /c-tuples, r' where 
all nodes labeled z have been deleted, and r" where each node labeled z has been 
(unlabeled and) given two children, the left one being labeled z\ and the right one 
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out development in the previous sense at all ^-labeled nodes in parallel, treating 
them all identically. As in the proof that (1) includes (2), it is easy to see that this 

definition makes r' and r" , regarded as morphisms out of (k), the composites of 

f with two other morphisms; in particular, r' and r" belong to the sieve R if r 
does. By repeated development, the given family of maps f becomes a new family 
of maps fj with the property that each tree in it has depth at most n + 1, and every 
label that occurs in any f- has at least one occurrence at depth n + 1 in fj (for if all 
occurrences were at depth < n then we could develop further at that label) . After 
development is finished, any /j without repeated labels is a member of S n (X k ). 

We shall show that all members of S n (X k ) must be among the /j's; as pointed 
out above, this will suffice to complete the proof, since each /j is in R. Suppose, 
toward a contradiction, that pE S n (X ) is not among the /j's. Form an instance 
t of p by a substitution that replaces the labels in p with distinct trees of all of the 
same depth d > n. This t is an instance of at least one of the original /j's, since all 
fc-tuples of trees are such instances. By considering the development process one 
step at a time, one easily sees that t is an instance of some /c-tuple at each stage 
of the development; in particular it is an instance of some /j at the final stage. 
That fj cannot have distinct labels, for then it would be in S n (X ), whereas the 
unique element of S n (X ) having t as an instance is p which is not an fj. So t is 
an instance of an /j in which some label, say z, occurs at least twice. Let s be 
the tree substituted for z in instantiating /j to t. Then s occurs at least twice as 
a subtree in components of t, namely wherever z occurred as a label in /j, and at 

least one of these subtrees has its root at depth n + 1 int. So when i was obtained 
by instantiating p, some label must have been replaced by s; thus, by our choice of 
that substitution, s has depth d > n. Our choice of substitution also ensures that 
s cannot have a second occurrence with its root at depth n + 1 in t, for at depth 
n + 1 we replaced all nodes in p with different trees. Nor can a second occurrence 
of s in t have its root at depth greater than n + 1, for then that occurrence would 
extend to depth greater than n + d, which is the maximum depth occurring in t. 
So s must have a second occurrence in t with its root x at depth e < n. In p, there 
is a node at the same position that x has in t (as the substitution leading from p to 
t affects only depths > n), and we call it x also. If x has any labeled descendants 
in p, then in t those descendants, at level n + 1, were replaced by trees of depth 
d. So the subtree with root x in t has depth n — e + 1 + d > d. If, on the other 
hand, x has no labeled descendants in p, then the subtree with root x extends to 
depth at most n — 1 < d in p (as p G S n (X k ), so any node at depth n must have 
two labeled children) , and nothing changes in this subtree when we pass from p to 
t. But the subtree of t with root x is s, of depth <i, so both cases are contradictory. 
This contradiction completes the proof that the sieve R includes S n (X k ) and is 
therefore a covering in (3) . □ 

Let S be the topos of sheaves on the dual of A with respect to the topology 
described in this lemma. We write % for the canonical geometric morphism from 
£ to the presheaf topos S A , so z* is the inclusion functor and i* is the associated 
sheaf functor. It follows from [9] that £ is the classifying topos for models of T', 
the generic model being G = i*(U). 
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of inverse bijections / : P(G) — > Q(G) and g : Q(G) — > P(G)" is internally true 
in £. This implies, by virtue of the internal meaning of "there exists," that there 
is an epimorphism C — > 1 in E such that in the slice topos EjC there is a pair of 
actual inverse isomorphisms between C*(P(G)) and C*(Q(G)), where C* means 
the inverse image along the canonical geometric morphism EjC — > E. 

We show that C has a global element 1 — > C. The fact that C — > 1 is an 
epimorphism in the sheaf topos £ means that, in the presheaf topos, it has dense 
range, i.e., that every object of the site is covered by arrows from objects A where 
C(A) is inhabited. (Here "from" refers to the site, the dual of A; in terms of maps 
in A we would have "to" instead.) But (0) is covered only by its maximal sieve, as 
is clear by description (4) in the lemma. So C((0)) is inhabited, say by z. But (0) 
is initial in A, so C has a global section (in S A and hence also in £) whose value 
at any object A is the image of z under C of the unique map (0) — > A. 

By taking inverse images along this section, we find that already in E (without 
having to pass to a slice topos) we have a pair of inverse isomorphisms / : P(G) — > 
Q(G) and g : Q(G) -> P(G). ' 

Now consider, in the topos S of sets, the set T of (finite, binary) trees. It is a 
model of T in an obvious way (in fact the initial model of T'). So it is n*(G) for 
some geometric morphism \i* : S — > E (a point of £). Since inverse images along 
geometric morphisms preserve finite products and coproducts, n*(f) and n*(g) are 
inverse bijections between the sets P(T) and Q(T). To complete the proof of the 
theorem, it suffices to show that they are very explicit, for then the polynomials 
P(X) and Q(X) are combinatorially equivalent and, by Theorem 3, algebraically 
equivalent, as desired. 

In fact, it suffices to prove somewhat less. In the definition of "very explicit 
function" in Section 3, we required that no label be repeated in any cfj (as part 
of the notion of pattern) and that <fj contain exactly the same labels as pi. But 
we mentioned, just before Theorem 3, that one could relax these requirements, by 
allowing repeated labels in qi and by allowing labels to occur inpi without occurring 
in cfi, without affecting the very explicit bijections. This is because any function 
that is very explicit in the liberalized sense but not in the original sense cannot be 
a bijection. So in our present situation, it suffices to show that //*(/) and n*(g) are 
very explicit in the liberalized sense, since we already know that they are bijections. 

Of course it suffices to treat as the situation is symmetric between the 

two bijections. In fact, it suffices to treat the restriction of //*(/) to one of the 
summands T k in P{T), for if each of these restrictions is very explicit then so is 
£**(/) itself. Note that such a restriction of //*(/) is fi* of a restriction of / to one 
of the summands G k of P(G) in E. 

So, changing notation slightly, we have a morphism / : G k — > Q(G) in E, and 
we wish to show that : T k — > Q(T) is very explicit in the liberalized sense. 

Recall that G is obtained by applying the associated sheaf functor i* to the 
underlying set functor U in S A . As %* preserves sums and products, we can write 
/ : i*(U k ) — > i*(Q(U)), so / corresponds, under the adjunction i* H i», to a 
map of presheaves U k — > i*i*(Q(U)). Since U k is a representable presheaf, rep- 
resented by (k), such a map corresponds via Yoneda's Lemma to an element of 
(i*i*(Q(U)))((k)). Here i*i*(Q(U)) is the associated sheaf of Q{U), regarded as 
a presheaf. Inspecting the usual construction of associated sheaves in terms of 
"patched together" sections, we find that any element of (i*i*(Q(U)))((k)) can be 

j„„ — ;u„j „„ -C„ll„„.„ rpl • „ — r li\ „„„ i . „„„„ /7_\ . /; \ „„J j-„ „„„u 
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of these maps is assigned an element qi G (Q(U))({li)) — Q(U({li))) in a coherent 
manner. (Coherence means that if two of the maps (k) — > in the cover can be 
completed to a commutative square by maps (h) — > A, then the resulting images 
in Q(U(A)) of the q^s are equal.) By the lemma, we may take the covering to 
be S n (X k ) for some n; then the maps (k) — > (li) in the covering are /c-tuples of 
patterns (since no label is re-used in a map in S n (X k )) such that every /c-tuple of 
trees is an instance of exactly one of them. If we call these patterns pi, then they 
and the corresponding qi are exactly as required in the liberalized definition of a 
very explicit function. It remains to check that, if we start with / : G k — > Q(G), 
transform it to U k — > i*i*(Q(U)) and then to an element of (Q(U)))({k)), and 
finally represent that element on a covering to obtain a very explicit function, as 
just described, then that function agrees with fi*(f). This verification is routine, 
tedious, and therefore omitted. □ 

Lawvere has pointed out that the proof of Theorem 4 emphasizes the distinc- 
tion between the 2-categorical sort of universality that defines classifying topoi and 
the 1-categorical sort that defines free algebras. For the proof shows that the (2- 
categorically universal) generic model G of T satisfies P(G) = Q(G) only when 
P{X) and Q(X) are algebraically equivalent; in particular, G 2 and G are not iso- 
morphic in the classifying topos. In contrast, as we remarked before the proof of 
Theorem 4, the (1-categorically universal) free model T of T in any topos, consist- 
ing of finite trees, does satisfy T 2 ^T + T^T. 
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